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Fisher information is a measure of the best precision with which a parameter can be estimated 
, from statistical data. It can also be defined for a continuous random variable without reference to any 

parameters, in which case it has a physically compelling interpretation of representing the highest 
precision with which the first cumulant of the random variable, i.e., its mean, can be estimated 
from its statistical realizations. We construct a complete hierarchy of information measures that 
^ . determine the best precision with which all of the cumulants of a random variable - and thus 

O ' its complete probability distribution - can be estimated from its statistical realizations. Several 

' properties of these information measures and their generating functions are discussed. 

I. INTRODUCTION 

^ . Fisher information [J, Q constitutes a central concept in statistical estimation theory which furnishes a variety of 
Ci ' useful estimates of deterministic parameters from statistical observations typical of a physical experiment. Its inverse 
"j^ I yields a lower bound, called the Cramer-Rao lower bound (CRLB), on the variance of any unbiased estimator of a 

■ continuous parameter and thus limits the best precision with which the parameter can be extracted from statistical 
^ measurements 

Q I It is useful to regard a probability density function (pdf) of a random variable as being implicitly parameterized in 
' terms of a translational location parameter, e.g., its mean, median, or mode. The Fisher information relative to such 
^>~,, a purely translation parameter is easily seen to be independent of that parameter, and may be defined as the Fisher 
information of the random variable 2] itself. The notion of Fisher information of a random variable has been applied 
Oh, to the case in which the random variable is the sample based mean. In the limit of large sample size, asymptotic 

estimates have been obtained in terms of the cumulants of the underlying pdf and their derivatives, 
^vq ■ In this correspondence, we present a further generalization of the Fisher information of a continuous random 
^ variable. Instead of using only implicit parameterizations, we explicitly parameterize all smooth, well-behaved pdf's 
in terms of their cumulants 0, Q . Since the set of all cumulants of a pdf uniquely and completely specifies the pdf, 
, the Fisher information matrix relative to all the cumulants should represent, in effect, the fidelity of estimation of 
the full pdf from data. The choice of cumulants to parameterize a pdf is a particularly convenient one since, as we 
shall see, it leads to a simple analytical form for the Fisher information matrix. These concepts can be generalized 
' still further, as we shall argue, to the case of a discrete, integer- valued random variable. We shall present some useful 
_ properties of these information measures, discuss their generating functions, and illustrate our considerations with 
. simple examples. 
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II. FISHER INFORMATION OF A CONTINUOUS RANDOM VARIABLE 

^ : 

Given a continuous random variable X distributed according to the pdf p{x) > for all real x (where a; is a statistical 
^ realization of X), we may define a parameterized version of this pdf, p{x\6) — p{x — 9). Note that p{x\6) — p{x) if 

■ 6* = 0. The Fisher information of p{x\9) with respect to is 
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where the angled brackets in the first hne indicate expectation value with respect to p{x\9), and J{X) is the Fisher 
information of the random variable X Note that because the pdf is positive for all real values of x, we can integrate 
over the infinite interval, and Je is manifestly independent of 6. Thus, any location parameter may be used for 9, and 
J{X) is therefore a functional only of the shape of the distribution, independent of its absolute location on any axis. 

The Fisher information of a random variable J{X) has two well-known interpretations J^j^J. First, J{X) quantifies 
the statistical precision with which the location parameter of a pdf on which the pdf depends translationally can 
be estimated from data drawn according to the pdf p{x — 9). On the other hand, because J{X) measures the 
mean squared slope of the log-likelihood function hip{x), it typically correlates with the narrowness of the pdf or, 
equivalently, with the degree of statistical reproducibility of the values assumed by the variable X. In the Bayesian 
context, this narrowness is related to the extent of prior knowledge about X. These two interpretations are related in 
that a narrower pdf will provide higher "resolution" when used as a measurement tool for determining the location 
parameter 9. 

A question of central interest in this correspondence is the following: What information measures characterize the 
fidelity of a statistical determination of the full pdf, not just its location parameter? We shall see presently that a 
particularly simple answer to this question can be obtained in terms of the Fisher information matrix elements relative 
to the cumulants of the pdf of the random variable. 



III. CUMULANTS OF A PDF AND ASSOCIATED FISHER INFORMATION 



Every pdi p{x) has a unique characteristic function associated with it, given by its Fourier transform, 

/"OO 

M{iy) = (e'"") = / dxp(x)e'''^ , (2) 



which, in most cases, may be expressed in a power series in i> in terms of the moments of the pdf: 



MM=T.-^f'n^ (3) 



n 

n=0 



where ^'^ = (a;") is the nth moment of the pdf (about 0). Writing the logarithm of the characteristic function in a 
similar series form defines the cumulants of the pdf as coefficients in the series expansion P|, |^ 

L(j.)^lnMH^ V^«„. (4) 

n—l 

Note that the function L{i') may be regarded as the generating function for the cumulants, since 

1 d"i 



(5) 

i/=0 



The cumulants k„ are related to the mean and central moments of the pdf, the first few of them taking the form 
Ki = p-'i^ K2 = M2, K3 = ^3, K4 = yLt4 — 2/^2, . . • , whcrc p,[ and /Z2 are the mean and variance of the pdf, while 
its third and fourth central moments, /X3, /K4, are related to its skewness and kurtosis. Since a pdf is related to its 
characteristic function by a Fourier transform, we may thus parameterize p{x) in terms of the entire collection of its 
cumulants (indicated by k) as the following inverse Fourier transform: 

pix)^p{x\K) d^^e— -exp|f;MlA^„| . (6) 

The Fisher information with respect to a set of real estimation parameters 6 = {^i, 92, . . .} is defined as a positive 
semi-definite matrix with elements 0, Q 

(«) ^ I d\np{x\e) d\np{x\e) \ 
" \ d9^ d9n I ' 

where the expectation value is taken over the pdf, p{x\6), of the statistical data from which the parameters are 
estimated. The superscript {6) indicates the parameter set being used, and both n and m range over the indices of 
the parameters in 0. For a continuous pdf p{x) that does not vanish at any finite value of x, this becomes 

^ dx dp{x\e) dp{x\e) 

J -00 Pirn d9^ 89^ ■ 
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Noting from Eq. ® that 



dp{x\K) (-1)" d"p{x\K) 



dKn 



(9) 



and using Eq. ((SJ with 9 chosen as the cumulant vector k, we obtain the Fisher information matrix relative to the 
cumulants. 



(-1) 



dx d"''p{x\K) d"'p{x\K) 



p{x\k) dx"^- 



0,1,2,. 



(10) 



While kq is not a true cumulant, including the possibility that n and/or m = in Eq. IjlOfl leads formally to a 
particularly convenient generating function for these elements, as we shall see in Sec. IIVI 

Equation IjlOfl is the most important result of this correspondence. It defines a complete hierarchy matrix of 
information measures, which we may call the cumulant information matrix (CIM). The diagonal elements of the 
inverse of the CIM yield the full hierarchy of CRLB's on how precisely the various cumulants of p(x|/«) may be 
estimated from the statistical realizations of X 1\ and thus, in a sense, how well the entirety of the pdf may be 
estimated. 

For m = n = 1, the CIM element reduces to J{X) defined in Eq. |^ as the Fisher information of the random 
variable. This result confirms our earlier interpretation of J{X) as the precision with which a fiducial location 
parameter of the pdf may be estimated. Without knowledge of any higher order cumulants, it is the first cumulant 
(the mean) that furnishes the most useful location parameter of a pdf. 



IV. A GENERATING FUNCTION FOR THE CUMULANT INFORMATION MATRIX 

A useful technique for evaluating the CIM elements is the method of generating functions. Upon multiplying both 
sides of Eq. (|10|l by A™/i", then summing over all non- negative integral values of m and n, and finally interchanging 
the order of integration and summations, a procedure justified by the uniform convergence of the involved Taylor 
expansions, we obtain the following result: 



m,n— 

dx 



-oo Pix\K) 
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(11) 



(12) 



The two pairs of square brackets in Eq. (|12|l enclose the Taylor expansions of p{x — X\k) and p{x — /i|/«), respectively. 
We thus arrive at a rather simple form of the function J'^'^-' (A, /i): 



J(-)(A,^) = 



dx 



p(x — \\k,)p{x — 
p{x\k) 



(13) 



The function j'-'^^ {\, fi) given by Eq. H13|) is a generating function for the CIM elements, since from Eq. (|ll|l an 

(k) 

arbitrary element Jmn may be expressed as its partial derivative 



1 



m!n! dX'^d^i' 



■jW(A,/i) 



(14) 



The elements Jq„ — J„q , as mentioned earlier, do not correspond to any information relative to any cumulants, and 
can, in fact, be easily shown to vanish for n > 1, while Jqq = 1, the pdf normalization. 



EXAMPLE: GAUSSIAN PDF 



The Gaussian pdf provides an analytically tractable illustration of the results of this correspondence. For the 
Gaussian pdf. 



p{x) 



1 

V27rCT2 ' 



(15) 
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its characteristic function is also Gaussian, and its cumulant generating function is thus quadratic, 

L{v) = lnM{v) = ivxQ - (16) 

The first two cumulants are thus its mean xo and variance cr^, while all higher order cumulants vanish identically. 
Parameterizing the Gaussian in terms of its cumulants n, we obtain, very simply, 

p{x\k) = ^^e-(--«^)'/2''^ . (17) 

The CIM generating function can be easily evaluated for the Gaussian pdf (|17|l . for which the integrand in Eq. I|13l) 
is also Gaussian and easily integrated, with the result 

J^^^X, /i) ^ e^^l''^ = e^^l"^ . (18) 
The individual CIM elements then follow from a use of Eq. H14() : 

The diagonal nature of the CIM implies simple CRLB's on the variance of any set of unbiased estimators, {k„}, of 
the cumulants of the pdf: 

var(K„) = ((k„ - K„)') > nlfj^" . (20) 

For general, biased estimators, the right-hand side of Eq. must be replaced by the nn— diagonal element of 
the matrix IB, where the matrix / is the inverse of the CIM, which for the Gaussian case is diagonal with the 
nn— element equal to n!cr^", B is the bias matrix, with elements Bmn = d{Km) /dun, and B^ is its transpose 0. 

The fc-statistics furnish useful unbiased estimators of the cumulants of a distribution based on a finite sample drawn 
from that distribution ^iTj,^]. It is well known that no cumulant estimator exists with a smaller variance than that of 
the corresponding fc-statistic. What our results show is that the fc-statistics for a Gaussian pdf are also asymptotically 
efficient estimators in the sense that the CRLB (I20|) is achieved in the limit that the sample size N ^ 00. This 
asymptotic efficiency has not been previously derived for any of the fc-statistics of order higher than 2. The variances 
of the first few ^-statistics for the Gaussian pdf are given below, along with the associated CRLB's from the CIM 
analysis, the latter given as the rightmost terms in the following expressions: 

var(fci) = ^ ^ _ (21a) 

var(/fc2) - > — (21b) 

^ ^' N -1 ^ N ^ ' 

var /ca) = — ^ > 21c 

2AiN+l)N4 24a« 
"^"^(^^^ = (iV-l)(iV-2)(7V-3) - (21d) 



(The CRLB for N trials is 1/N times the CRLB for one trial.) Notice that for the Gaussian, fci is an efficient 
estimator for any N, and the others are asymptotically efficient (indicated by >) in the limit N — > 00. This is true 
of all higher-order fc-statistics for the Gaussian, as well. 

Two remarks are in order here. First, although any unbiased estimator of a third or higher order cumulant is, on 
average, zero for the Gaussian pdf, there is a finite statistical scatter in the data from which the cumulants, regardless 
of their order, are estimated. Second, the sharp increase of the CRLB's (|20|) with increasing n is a reflection of the 
sharply decreased probability of occurrence of values of a Gaussian variate in the wings of the pdf to which the higher 
order cumulants are increasingly sensitive as a function of their order. This is in fact a general result for any localized 
distribution; the CIM elements ^ as n,m — > 00, because of the factor of nlml in the denominator of Eq. H1U|I . 
resulting in CRLB's that increase without bound as the order of the cumulant being estimated increases. 
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VI. AN INVERSE CIM GENERATING FUNCTION 



The CIM elements are of value only insofar as they give a general sense of how much information the data contain 
about the cumulants of the pdf from which the data are drawn. It is the inverse of the CIM that is needed to establish 
the CRLB, and such an inverse is often difficult to calculate. If a less stringent bound is acceptable, one can use the 
reciprocal of the nn-diagonal element of the CIM Q to bound the minimum variance from below. Such a bound, 
though easy to write down, is not optimal, however, since the CRLB represents in general a greater lower bound. 
The desire for a simple way to calculate the inverse CIM motivates the following attempt to define an appropriate 
generating function for the inverse CIM and relate it to the CIM generating function defined in Eq. H13|l . 

Let us define the generating function /'^'^^ (i^, if) for the matrix elements Imn of the inverse of the CIM as 

/W(.,r;)E. (22) 
^ — ' m\n\ 



m,n— 



from which the inverse CIM elements may be obtained as follows: 

(23) 



iy — 7] — () 



The factorials, m! and n!, are included in the definition (|22|1 to allow appropriate convergence of the generating 
function (Compare Eq. We may also define two "marginal" generating functions, 



00 00 



4'^) (A) ^ 4) (^) ^ Y ^in ^ , (24) 

m=0 i=0 



that are related to the full generating functions H13|l and (|22|l by 

1 d" 



gn 



(25) 

T]=0 



By multiplying the two marginals (jSU, summing over the index n from to cxi, and noting that the symmetric 
matrices Iml and Jml are mutual inverses, we obtain the following relation: 



n—O fc,m— n— 

= > ^ T^km 

^-^ ml 

fc,m— 

= e^" . (26) 



A Fourier analysis gives a second, integral relation, 

— d^i df^e-'^^J^'^\X,tfi)l('^^{,y,rj)^e^\ (27) 

J —00 J —00 

as shown in the Appendix. These relations are valid for all values of A and v in the complex plane. In general, 
relations H26|l and H27|l cannot be solved analytically. The case of the Gaussian pdf is an exception, for which the 
preceding relations can be solved and the generating function {v, rf) derived exactly, 

I^'^\v,ri) = e'^'T'^ . (28) 

Applying Eq. (|23|l to this result yields the same CRLB's (EOl) as obtained in Sec.|V| While the CIM for the Gaussian 
is admittedly trivial to invert, and thus it is not necessary to use the generating function method in this case, we 
anticipate that in some cases it will prove easier to approximate the solution of cither relation (|26|l or 12 7|) than it 
would be to invert the full CIM. 
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VII. GENERALIZATION TO A DISCRETE RANDOM VARIABLE 



While the Fisher information of a continuously random variable over the infinite interval may be defined through 
the use of an arbitrary location parameterization, discrete distributions do not readily permit an analogous definition. 
Although one cannot treat a discrete random variable as a continuously-valued parameter, the class of discrete 
distributions defined positive over the entirety of the integers can always be parameterized in terms of their cumulants 
and a corresponding CIM matrix defined. 

For a discrete random variable X distributed according to P{x) with x ranging over all integers, we may parameterize 
this distribution as 



P{x\k) = / dz^e-^'^^exp. 



P{x) ^ P{x\k) = £ / dv e-'-^ exp { I , (29) 



=1 



where 



1 d"ln(e' 



(30) 



are the cumulants in exact analogy to the continuous case. The only formal difference is in terms of the limits of 
integration in H29(l . reflecting the fact that (e"'^) is now a Fourier series} 

The parameterized distribution P(x\k) coincides with P(x) for all integral values of x, but it is also C°° and 
therefore defined for intermediate values of x as well. These intermediate values are not probabilities of anything, 
and they can range outside the interval [0, 1]. Still, because of its coincidence with P{x) at integral x, P{x\k) may be 
substituted for P(x) when calculating an expectation value of any function of X. Most importantly, though, we may 
now also take derivatives of this parameterized distribution with respect to x, which are related to partial derivatives 
with respect to the distribution's cumulants exactly as in the continuous case, i.e. 

aP(x|/«) _ (-1)" a"P(a;|/«) ^^^^ 
dKn n\ dx" 

This allows us to define the discrete CIM elements exactly as in the continuous case, with the infinite integral replaced 
by an infinite sum: 

j^i^tir;^ ± } a-p(.i.) .-p(.i.) ^^_o,i,2,.... (32) 

mini ^ P{x\k) dx""^ dx" 

The possibility that n or m = is also included in this definition for assistance in defining a generating function for 
these matrix elements, namely 

j,^, f: ^(^-Ai.)p(p,N ^ 

X — — 00 ^ ' ' 

in exact analogy to the continuous case. Without modification, Eq. 1)14(1 may be used to generate the individual CIM 
elements, and the generating function relations given in Eqs. (jSnj and ^ also hold. 

This is a powerful result. While it is not possible to directly define the Fisher information of a discrete random 
variable as in the continuous case, we can define an analogous quantity: the Fisher information with respect to the 
mean of the cumulant-parameterized distribution. This is of the discrete CIM matrix. Since in the continuous 
case j}^' — J{X), we can define J{X) = for a discrete distribution. The CIM matrix provides a complete 
hierarchy of information measures with respect to the cumulants in the discrete case, as well, and the inverse of the 
CIM again gives the CRLB on any unbiased estimators of the cumulants. 



For a random variable that ha s a one-sided discrete reahzation over the set of all non-negative integers only, the characteristic function 
(exp(ii/a;)) is the z-transform llOl of the distribution P(x), rather than its Fourier series, with z = exp[—iv). Notwithstanding this 
difference, the parameterization of the distribution in terms of its cumulants is formally identical to that given in Eq. and the 

considerations of this section apply essentially unchanged. 
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VIII. CONCLUDING REMARKS 



The Fisher information of a continuous random variable can be interpreted as the fidehty with which a fiducial 
location parameter of a pdf (such as the mean) may be estimated from statistical data drawn according to that dis- 
tribution. In this correspondence, we have introduced a more robust information measure — the cumulant information 
matrix (CIM) — whose inverse bounds the variance of any estimates of the cumulants of a pdf and, consequently, the 
fidelity with which the entire pdf may be estimated. The Fisher information of the random variable is included in this 
measure. We have also extended the CIM concept to discrete random variables defined over integers, for which the 
notion of Fisher information of the random variable is ill defined. We have also derived a generating function for the 
CIM and given two relations between this generating function and a generating function for the inverse CIM, which we 
hope will prove useful in calculating CRLB's. Further generalizations of this work could include defining a cumulant 
information matrix or an analogous quantity for multivariate distributions. (See McCuUagh 14] for information on 
multivariate cumulants.) 

We noted in Sections IIIII and IVIII that the probability distribution to be parameterized must be strictly positive. 
This was done to avoid singularities in the Fisher information calculations. As a general rule, the CIM for an arbitrary 
random variable must be defined by restricting the interval of integration or summation to the actual sample space 
of that variable. This is particularly important when a priori constraints like finiteness of support place restrictions 
on the set of possible values of the allowed cumulant vectors. One constraint that is implicitly active in all of our 
calculations is that the probability distribution, whether continuous or discrete, is restricted to nonnegative values. 
While this is obvious from the standpoint of probability theory, it must be explicitly imposed on the space of all 
estimable probability distributions. One only need consider the parameterization ^ to see why the cumulants n 
cannot take arbitrary values if the left hand side of that equation is to remain nonnegative. CRLB's calculated in 
the present paper apply only to estimators that do not explore the boundaries of the parameter space beyond which 
negative distributions arc encountered. Incorporating edge effects will, in general, reduce the minimum variance of 
a more general estimator, but such bounds are not calculable from the Fisher information matrix since inequality 
constraints do not affect its form. However, certain other constraints such as support constraints can alter the Fisher 
information matrix and change the method that one must use to calculate CRLB's. The effects of constraints on 
CRLB's are explored in detail by Gorman and Hero [T]| . 

The considerations of this paper may be relevant to the general area of inverse problems such as those concerning 
image restoration from noisy image data. A particularly useful viewpoint to adopt in discussions of image processing 
is to treat the spatial distribution of intensity in an image, when properly normalized, as representing the probability 
distribution of the emission or detection of a photon over the image. From this perspective, image restoration is 
equivalent to the problem of estimating a probability distribution, the very problem we have discussed here. The 
presence of noise in the actual image data greatly compounds this estimation problem, a subject that requires further 
study. A noteworthy algorithm which makes essential use of this statistical viewpoint is the maximum entropy method 
jl2| |. Blind deconvolution methods 13] provide another example. They rely on the existence of constraints to recover 
the point-spread function as well as the source intensity distribution, both of which may be regarded as appropriate 
pdf's. 
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APPENDIX A: DERIVATION OF A GENERATING FUNCTION RELATION 
To prove Eq. (|?7|) . we first note that 

dx / dy e-^yx^y"" = da; x" / dy y'^e"'^'' 



am poc 

dxx^'i"'- / dye-'^^y 

= 27ri'" / dxx''6^"'\x) , (Al) 
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where S^"^^{x) is the mth derivative of the Dirac 5-function. Integrating by parts m times in the right-hand side of 
Eq. HAlfl reduces that integral to (— 1)™ times the mth derivative of a;" evaluated at x = 0, namely to {—l)™ml6mn, 
and the following identity results: 



1 

2^ 



oo />oo 



— oo J — oo 



dx dy e 



, {ixY' 



(A2) 



Now, from the generating function definitions and H22|l and from the marginal definitions H24(l . it is clear that 



J('') ( A, ^) = 5] J('') (A)/." and J^'^) 77) = ^ /('^) (^) ^ 



n=0 



n=0 



Plugging these results into the left-hand side (LHS) of Eq. H27(l . we obtain 



LHS = — 
2tt 



dfi / dyye^'^" 

; J —00 

= 5] JM(A)/W(^) 



m,n— 



00 



^ r' df, r' d^ye-'^^M"^ 

27r ./_^ n! 



(a) 



m,n— 



(A3) 



(A4) 



where Eq. IIA2p was used in step (a), and Eq. (|26|) allowed step (b). Thus, relation H27II is established. 
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